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Abstract 

We consider the problem of identification and authentication based on secret key generation from 
some user-generated source data (e.g., a biometric source). The goal is to reliably identify users pre¬ 
enrolled in a database as well as authenticate them based on the estimated secret key while preserving the 
privacy of the enrolled data and of the generated keys. We characterize the optimal tradeoff between the 
identification rate, the compression rate of the users’ source data, information leakage rate, and secret key 
rate. In particular, we provide a coding strategy based on layered random binning which is shown to be 
optimal. In addition, we study a related secure identification/authentication problem where an adversary 
tries to deceive the system using its own data. Here the optimal tradeoff between the identification rate, 
compression rate, leakage rate, and exponent of the maximum false acceptance probability is provided. 
The results reveal a close connection between the optimal secret key rate and the false acceptance 
exponent of the identification/authentication system. 
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I. Introduction 

Consider an identification and authentication system with K users (see Fig. 1). In the enrollment phase, 
each user w E {1,2,...,iG} generates a source sequence X n (w ) and provides it to the system. Such 
source sequences are compressed into M = {M(w) : w = 1,..., K} and stored into a database. The 
compressed user source data will be used as a reference for identification of the enrolled users. At the 
same time, the system produces a set of secret keys {S(w) : w = 1 ,...,iT}, also functions of the 
users’ source sequences, which will be used as a reference for authentication of the identified user. In 
the identification/authentication phase, an a-priori unknown user W provides a measurement Y n to the 
system. For example, this could be seen as a noisy version of its enrolled source sequence X n (W). Based 
on the stored database M and measurement Y n , the user is identified as W. The system also produces 
an estimated key S. The user is successfully identified and authenticated if (W, S) = (VF, S(W)). 

enrollment phase 

X n {l) 

X n (2) 


X ra (iT) 


y n (M,Z n ) 

identification/authentication phase 



yes / no 



Fig. 1. Identification and secret key-based authentication system with an adversary. The enrollment phase is contained in the 
green box. The remaining part corresponds to the identification/authentication phase. The red dashed arrow corresponds to an 
active adversary which replaces the original user measurement Y n with its own generated signal y n (M,Z n ) in order to gain 
access to the system. 


The system described above can be relevant in several applications including those involving access 
control, secure, and trustworthy communication. In database identification for access control applications, 
the system identifies an individual as an enrolled user and then grants the corresponding access based on 
authentication using the secret-key. In other words, the system first finds out which user in the database 
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the individual corresponds to, and then verifies whether the individual is really the user he/she claims to 
be. 

One important class of access control applications is related to using biometric data such as fingerprint, 
iris scans, voice, face, and DNA sequences (see, e.g., [1] and references therein). Unlike passwords, the 
biometric data inherently belong to the users and provide a convenient and seemingly more secure way 
for identification/authentication. However, it is crucial that privacy of the enrolled data must be protected 
from any inference of an adversary. The privacy risk in this case is of potentially high impact since 
the biometric data is typically tied to the person identity. If it is compromised, it cannot be reverted or 
changed easily, unlike in the case of a password. 

In this work, we consider secret-key based identification and authentication problems in the presence 
of an adversary which is not part of the system but has full knowledge of the stored database data M 
as well as to some “on-line” side information Z n , as shown in Fig. 1. We refer to Z n as on-line side 
information since it is statistically dependent on the actual user W which is trying to be identified and 
authenticated. In contrast, the knowledge of M can be regarded as “off-line” side information for the 
adversary. Two closely related scenarios are studied: 

1) The adversary is passive and is only interested in inferring the user source sequence. In this case, 
we wish to design a reliable identification/authentication system that achieves maximal identification 
rate and secret key rate (see definitions in Section II- A) while minimizing the the compression rate 
of the stored descriptions and the information leakage of the enrolled source sequences. In general, 
there exists tension between these performance metrics. For this scenario, our main contribution is 
a single-letter characterization of the optimal tradeoff region of the identification rate, compression 
rate, information leakage rate, and key rate for discrete memoryless sources. 

2) The adversary is active and tries to deceive the identification/authentication system by using its 
own sequence Y n — y n (M,Z n ). We refer to the event where the legitimate user fails during 
the identification/authentication as a false rejection , and to the event where the system accepts the 
adversary as a false acceptance. In this case, we wish to design a secure identification/authentication 
system that achieves arbitrarily small false rejection probability with maximum identification rate 
and: i) minimizes the compression rate of each stored description, ii) minimizes the leakage rate 
of each enrolled sources, and iii) maximizes the error exponent of the maximum false acceptance 
probability (mFAP) (see definitions in Section III-A). For this scenario, our main contribution is 
a single-letter characterization of the optimal tradeoff between the identification rate, compression 
rate, information leakage rate, and mFAP exponent for discrete memoryless sources. 
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In order to motivate the role of key-based authentication to the possibly unfamiliar readers, we use 
the following naive everyday-life example. Consider the front door of a building with an intercom with 
multiple buttons. Each button corresponds to an apartment in the building. An intruder may wish to gain 
access to the building by hitting at random a button, hoping that the people inside the corresponding 
apartment just open the door, by identifying the intruder as friend/family just because he/she hit their 
button. Instead, if the intercom is also equipped with a camera and a facial recognition software, the door 
will be opened only if the intruder face (properly projected into some features space) generates a hashing 
function value that matches with the key corresponding to that apartment. Technically speaking, the 
optimal identification problem corresponds to K -ary hypothesis testing, which just provides the answer 
minimizing the average probability of wrong identification. However, the identified user needs also to be 
authenticated (in this case, by showing his/her face) in order to rightfully gain access to the system. 

Related Work 

Authentication problems have been studied from an information theoretic perspective in several direc¬ 
tions. Maurer in [2] considered the message authentication problem in connection with the hypothesis test¬ 
ing problem where the underlying message probability distributions of the legitimate user and adversary 
are assumed to be different. Martinian et al. [3] considered authentication with a distortion criterion. More 
recently, some works have considered authentication based on secret key generation [4] which are closely 
related to fuzzy extractor [5]. These include, for example, Lai et al., and Ignatenko and Willems [6], 
[7], [8], [9], [10], which focused on biometric authentication systems [11] where privacy of the enrolled 
data is also taken into account. In [12], we considered a general case where the adversary has correlated 
side information and we provided a complete characterization of the fundamental tradeoff. Analysis 
of deception probability in authentication systems from an adversary’s perspective was also considered 
in [1 ]. Closely related to the secret key-based authentication problem with privacy constraint are the 
problems of source coding with privacy constraint, e.g., [14], [15], where the goals are to reconstruct 
the source reliably while preserving the privacy of the source or the reconstruction sequences from the 
inference of an eavesdropper. 

By extending the single-user authentication problem to the identification/authentication problem in the 
multi-user case, another dimension is added into the problem, namely we also care for the identification 
rate. A database identification problem for biometric data was considered in [16], [17] where the noisy 
measurement of all user data are treated as a database and the maximum identification rate was charac¬ 
terized. Later, Tuncel [1 ] considered the problem where the database is a compressed version of the user 
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data and showed the optimal tradeoff between identification rate and compression rate. Recently, this was 
extended to include also a lossy reconstruction constraint at the decoder [19]. Ignatenko and Willems [20] 
studied the problem of user identification together with secret key-based authentication under a privacy 
constraint, extending the secret-key based authentication problem to the multi-user setting. 

Contribution and Organization 

In this work we extend the setting of [2< ] to a more general case, including a compression rate constraint 
on the source description and allowing the adversary to have access to correlated side information. The 
setting of this paper can also be viewed as a multi-user extension of our previous work [12]. Correlated 
side information at the adversary, as treated here, is of practical interest since it models scenarios 
where the adversary can have access to noisy version of the source data. In Section II, we study the 
secret key-based identification with a privacy constraint and provide a complete characterization of the 
identification-compression-leakage-key rate region 1Z\ for discrete memoryless sources. It is shown that 
the layered binning scheme with rate allocation between compression and identification only on the first- 
layer description is optimal. The result includes many other results as special cases, one of which is the 
compression-leakage-key rate region for secret key-based authentication problem in [12]. Binary examples 
illustrating the derived tradeoffs are also provided. In Section III, we study a secure identification problem 
with a privacy constraint and provide a complete characterization of the identification-compression- 
leakage-mFAP exponent region 72,2 for discrete memoryless sources. Our results show that the maximum 
key rate in 72, i is equivalent to the maximum mFAP exponent in 72,2, revealing a connection between 
secret key rate and security of identification/authentication system. 

Notation : We denote discrete random variables, their corresponding realizations or deterministic values, 
and their alphabets by the upper case, lower case, and calligraphic letters, respectively. X ^ denotes the 
sequence {X m ,..., X n } when m < n, and the empty set otherwise. Also, we use the shorthand notation 
X n for Xf. The term X n \ l denotes the set {Xi,..., X^_i, X^+i,..., X n }. When a random variable X 
is constant we write X = 0. A length-X vector of descriptions (M( 1),... ,M(K)) is denoted by M, 
where M^ w is the vector (M( 1),..., M(W — 1), M(W + 1),..., M(K)). Cardinality of the set X is 
denoted by \X\. We use [1 : N] to denote the index set {1, 2,..., N}. Finally, we use X — Y — Z to 
indicate that (X, Y, Z ) forms a Markov chain. Other notations follow the standard ones in [21]. 
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II. Secret key-based Identification/Authentication with a Privacy Constraint 
A. Problem Formulation 

Let us consider a secret key-based identification and authentication system as shown in Fig. 1. Source, 
measurement and side information alphabets, Zb, y, Z are finite sets. The users’ source sequences X n (w) 
for w G = [1 : K] are independent across the users and have i.i.d. components distributed according 
to some fixed source distribution Px- In the enrollment phase, an encoder generates a description M(w ) 
and a secret key message S{w) based on X n (w), for each w G W^ n \ The descriptions are stored in a 
database for later identification and authentication. In the identification/authentication phase, an arbitrary 
unknown user W G W^ n \ independent of the enrolled source sequences and stored database, presents 
itself to the system, and generates measurement sequence Y n jointly distributed with X n (W). Based on 
Y n and the stored database M = (M( 1), M{ 2), ..., a decoder identifies the observed user as 

W and generates an estimation of the key S. The identification and authentication operation is successful 
if (W,S) = (W,S(W)). 

We consider an adversary which has access to the whole database M and to a side information sequence 
also jointly distributed with X n (W). The information leakage rate of user W at the adversary is 
measured by the mutual information rate I(X n (W); M , Z n )/n. Similarly, the key leakage rate of user 
W at the adversary is measured by the mutual information rate I(S(W ); M, Z n )/n. 

In this work we assume that (X n (W), Y n , Z n ) are memoryless (with respect to the sequence index i) 
with the z-th marginal joint distribution Px,y,z — PxPy,z\x> where Py,z\x is a given transition proba¬ 
bility distribution of a discrete memoryless broadcast channel (see Fig. 1). In contrast, for all w / W, 
the triples (X n (w), Y n , Z n ) are memoryless with the z-th marginal distribution PxPy : z , where Py,z is 
the YZ-marginal distribution of p x,Y,z- 

We are interested in characterizing the optimal tradeoff between the identification rate, compression 
rate, information leakage rate, and secret key rate, defined as follows: 

Definition 1: An |<S( n )|,n) -code for secret key-based identification and authentication 

with a privacy constraint consists of 

• A set of stochastic encoders : w G such that the w- th encoder takes X n (w) as input and 

generates (M(w), S(w)) G x according to a conditional PMF p(m(w), s(w)\x n (w)). 

• A decoder : M^ K x y n —>* W^ n \ such that the identified user is W = (M,Y n ). 

• A decoder g^ : M^ K x y n -A S^ n \ such that the estimated secret key is S = g^(M, Y n ). 


0 
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Definition 2: An identification-compression-leakage-keyrate tuple (i?j, Rc-> L , Rs) E is said to be 
achievable if, for any 8 > 0 there exists a sequence of |W^|, |<S^|,n)-codes such that, for all 

sufficiently large n, 


max P(W, S) ± (W, S(WO)) < 

(1) 



— log W (n) | > Ri~ 5, 

(2) 

n 1 1 


— log LM (n )| < Rc + S, 

(3) 

n 1 1 


max -I(X n (W)-,M,Z n ) < L + S, 

(4) 

wew^ n 


max -I(S(W)-,M,Z n ) < S, 

(5) 

wew^ n 


min -H(S(W)) > R s - 

wew( n ~> n 

(6) 


The identification-compression-leakage-keyrate region IZi is defined as the closure of all achievable tuples. 

0 

B. Results 

Theorem 1: The region TZi for the identification/authentication problem defined above is given by a 


set of all tuples (i?j, Rc, L , Rs) G such that 

Ri<I(Y-U ), (7) 

i?c>i?/ + /(x ; y|n (8) 

L > /(X; V, y) - /(X; y |C7) + /(X; Z|C7), (9) 

R S <I(V-,Y\U)-I(V-,Z\U), (10) 

for some with |Z7| < |X| + 4, |V| < (|X| + 4)(|X| + 2). □ 


By standard time-sharing argument [22], it is immediate to show that IZi is convex. 

Before giving the proof of Theorem 1, some remarks are in order. 

Remark 1 (Layered random binning): Binning usually helps to reduce the rate needed for compression. 
In the related identification problem [19] the authors showed that the binning scheme is optimal when 
an additional reconstruction constraint is included. As we shall see in the proof of Theorem 1, layered 
binning turns out to be also optimal in the presence of an information leakage constraint towards an 
adversary with access to correlated side information. Interestingly, we note that the obtained tradeoff 


between compression and identification rates in (8) results from the rate allocation which is applied only 
on the first layered codeword. 

Remark 2 (Special cases): Theorem 1 recovers results of several special cases in the literature, 
i) When there is only one user in the database, i.e., |W^| = 1, the problem reduces to authentication 
with a privacy constraint studied in [12] (see e.g., Fig. 2). It can also be viewed as an extension of the 
secret key agreement problem with one-way communication constraint [23] to include an information 
leakage constraint. By setting Rj = 0 in TZi, we obtain the compression-leakage-keyrate region consisting 
of all tuples ( Rc-> L , R$) such that 

Rc>I(X-V\Y), 

L > /(X; V, Y) - /(X; Y\U) + /(X; Z\U\ 
Rs<I(V;Y\U)-I(V;Z\U ), 

for some joint distributions of the form Px,Y,zPv\xPu\v with \U\ < \X\ + 3, |V| < (\X\ + 3)(|X| + 2). 


enrollment phase 



Fig. 2. Secret key generation for authentication with a privacy constraint. 

ii) When restricting to the case without secret key-based authentication (R$ = 0), the problem reduces 
to identification with a privacy constraint. By setting V = U in we obtain the identification- 
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compression-leakage rate region consisting of all tuples (Ri,Rc,L) such that 

Ri<I(Y-U ), 

Rc>Ri + I{X-,U\Y), 

L > I{X\U,Z), 

for some joint distributions of the form Pxy,zP\j\x • Furthermore, without the leakage constraint, this 
result recovers the optimal compression-identification rate (capacity/storage) tradeoff in [1 ], [19]. 

iii) When there is no compression rate constraint (i.e., Rc = H(X)) and, furthermore, the adversary 
has no “on-line” side information (i.e., 0 = 0 ), the region reduces to the set of all tuples (Ri,L,Rs) 
such that 

Ri<I(Y-U ), 

L > 7(X; V, Y ) - /(X; Y\U) = /(X; X|X) + I(X; *7), 

< 7(X; X |U) = /(X; X) - I(Y ; X), 

for some joint distributions of the form Px,yPv\xPu\v- By setting i?/ = I{Y\U) in the expression 
above (thus restricting the region to a potentially smaller set), we recover the result in [20], i.e., we 
obtain an achievable region that coincides with the region derived in [20]. 

Proof of Theorem 1: Achievability is proved based on a random coding argument where we use 
the definitions and properties of e-typicality as in [21]. 

Achievability: Our achievable scheme utilizes layered coding, binning, and subbinning as illustrated 
in Fig. 3. 1 Fix Py\x and Pjj\v- Let e and S € be positive real numbers where S e —>> 0 as e —>► 0. Assume 
that I(V]Y\U) — I(V]Z\U) > 0. Note also that the joint distribution Px,Y,zPy\xPu\v implies that 
U — V — X — (Y, Z) forms a Markov chain. 

1) Codebook generation: Randomly and independently generate codewords u n (j ) for j G [1 : 2 n ^ I( ^ X]U ^ 5e \ 

1 Intuition for our achievable scheme is as follows. We use two layers of codewords { U n } and {V n } to be able to adapt to the 
presence of the adversary by controlling the information leakage via the descriptions M. Since the decoder has side information 
Y n , binning is used to reduce the compression rate at each layer. This also essentially reduces the information leakage rate. 
Moreover, we divide the second layered bin into subbins for the secret key in order to prevent the key leakage. We note that the 
availability of side information Z n at the adversary has an impact on the structure of the achievable scheme. If Z n becomes 
degenerate, i.e., Z — 0, then it can be shown that the second layer of codewords and subbinning are not required to achieve the 
optimal identification-compression-leakage-keyrate region. 
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Fig. 3. Layered binning; rate allocation between compression and identification only applies to the first layered codewords, 
i.e., mi G [l : 2 n ^ I( ^ x ' ,u ^ Y ^ +Rl+2Se ^]. The bin indices (mi,m 2 ) are sent to the decoder as a helping data. The secret key is 
chosen as a subbin index s of the chosen codeword v n . 


according to the product distribution YYi=i Pu( u i)- Choosing some identification rate 

Ri<I(U\Y)-S e , (11) 

we distribute the codewords uniformly at random into 2 77 '( / ( x ; t/ l y )+ i ^+ 2( ^) bins 6(7(mi), mi G [1 : 
2 n(i(X;U\Y)+R I + 25 e )]. E ac h bin contains 2 n ( 7 ( t/ X)-#/-<L) codewords, each indexed by m', where /(X; £/) — 
/(X; U\Y) = I (U\Y) follows from the fact that U — X — Y. There exists a one-to-one mapping between 
index j and the pair of bin/codeword indices (mi,m') such that, without loss of generality, we can 
identify j = (mi,m'). 

For each j , randomly and conditionally independently generate codewords v n (j,k ) for k G [1 : 
2 n(i(x-y\u)+d e )j ? acC ording to the conditional product distribution YYi=i Pv\u( v i\ u iti))> an d distribute 
these codewords uniformly at random into 2 n ( 7 ( x X|L,Y)+3(L) bins m 2 ), m 2 G [1 : 2 n ( / ( x X|L,Y)+3<5 e )j. 
Each bin b v (j , m 2 ) contains 2 ™( / 0 / X|L)- 2 <L) codewords, where /(X; y|C/)-/(X; y |t/, y) = J(y ; y \U) 
follows from the fact that U — V — X — Y . Moreover, the codewords of each bin by (j, m2) are distributed 
uniformly at random into subbins, indexed by s , where s G [1 : The index 

s here represents a subbin index of the second-layered bin. In each subbin, there are 
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codewords, each indexed by s'. There exists a one-to-one mapping between index k and the triple 
of bin/subbin/codeword indices ( 7712 , 5 , 5 ') such that, without loss of generality, we can identify k = 

(7712,5,5'). 

2) Enrollment: For each user w G [1 : K], given X n (w) = x n (w), the encoder looks for u n (j ) that 
is jointly typical with x n (w) and then for v n (j, k ) that is jointly typical with (x n (w), u n (j)). From the 
covering lemma [21], with high probability, there exist such codeword pairs since there are more than 
2 ni(X;U) codewords u n (j ) and, for each j, there are more than 2 nI( ^ x Y\ u ) codewords v n (j, k). If there are 
more than one such pairs, the encoder selects one of them uniformly at random. Let the chosen codeword 
indices of user w be denoted by (j(w),k(w)) = (7771(77;), 777/(77;), 7772(77;), s(w),s'(w)). The encoder stores 
the corresponding bin indices 7771(77;) and 7772(77;) into the database as the stored description of user w. 
The compression rate of each user is thus given by 

Rc = /(*; U\Y) + Ri + /(X; V\U, Y ) + 55 e 

= /(X;V|y) + i?/ + 55 e , (12) 

where the second equality follows from the chain rule /(X; U\Y) + /(X; V\U, Y) = /(X; C7, V\Y) and 
from the fact that I(X m ,U,V\Y) = /(X; V\Y ) due to U—(V, Y)—X. The secret key corresponding to user 
w is given by the subbin index s(w) in which the chosen sequence v n (j(w),k(w)) G by(j(w),rri 2 {w)) 
is found. 

3) Identification/Authentication: Given an arbitrary user W G [1 : K], let y n denote the realization of 
the corresponding measurement sequence Y n . 

The decoder has bin indices (7771,7772) = ((777-1 ( 1 ), • • •, mi(K)), (7772(1),..., rri 2 (K))) and observes 
y n . Then, for each w G [1 : K\, it looks into the corresponding bins bu(mi(w)) and 6^(7771(77;), 777', 7772(77;)) 
for all indices 777' forming bin bv(rni(w), 777', 7772(77;)), and check if there exists a codeword pair 

(77™(7771 (77;), 777'), t ; 71 (7771 (77;), 777', 7772(77;), 5, 5')) 

jointly typical with y n for some 777', 5, s'. Suppose that there exists a unique w for which this condition 
holds. Then, the decoder outputs the identified user w. Otherwise, if none or more than one user index 
satisfy the condition, an identification failure is declared. Suppose that such unique w is found. Then, 
the decoder outputs also s to be the 5-index of one of the codeword pairs 

( t / 1 (7771(77;), 777'), t ; 77 , (7771(77;), 777', 7772(77;), 5, 5')) 


satisfying the typicality condition. If there exist more than one such indices 5, one is chosen at random. 
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Finally, the decoder compares s with s(w), and declares the identified user as successfully authenticated 

if s = s(w). 

Let U n (Mi(w), M') and V n (Mi(w), M', M 2 (u;), S(w), S') be the codewords chosen at the encoder 
for each w G [1 : K] in the enrollment phase, and (Mi, M 2 ) be the corresponding index vectors of 
the bins stored in the database. We note that by symmetry of the codebook generation, the analysis of 
identification/authentication error does not depend on which user W G [1 : K] is present to the system. 
Suppose that a user W — w is present. The relevant identification/authentication error events are: 

E 0 : {(Y n ,U n (M 1 (w),M'),V n (M 1 (w),M\M 2 (w),S(w),S')) £ 7< n \Y,U,V)} 

E ld : {(Y n ,U n (Mi{w),m'),V n (Mi{w),m\ M 2 (w), a, s')) € T e (n) {Y,U,V ) for some w ^ w and m',s,s'} 
E Au : {(Y n ,U n (M 1 (w),M'),V n {M 1 (w),M , ,M 2 (w),s,s')) € T e {n) (Y,U,V) for some s + S(w ) and s'}. 

For any W = w, by LLN, (X n ,U n {M 1 (w), M'),V n (M 1 (w), M', M 2 {w), S(w), S'),Y n , Z n ) are 

jointly typical with high probability. Thus, P(£’o) —^ 0 as n —)► 00 . From the packing lemma [21], 
P(£i d U E Au ) 0 as n oo if i logK • |M , ||5||5 , | < I(Y;U,V) and 1 log |<S||<S'| < J(y ; V|Cf). 
where Ff = A} . These conditions are satisfied by the code construction. 

Note that here we show that the average probability of identification/authentication error P((W r , S) ^ 

(W, S(W )) can be made arbitrary small for n sufficiently large. Using the expurgation argument with re¬ 
spect to the user index W, it can be shown that the maximal error probability of identification/authentication 
max^ E yy(n) P ((W,S) 7 ^ ( W,S(W )) can also be made arbitrary small at the same asymptotic identifi¬ 
cation rate Rj. 

Before proceeding with the analysis of leakage rate, we give a lemma which provides a bound on the 
n-letter conditional entropy based on properties of jointly typical sequences. 

Lemma 1: Let J(w) be the index of codeword U n . If P ((U n (J(w)), Z n ) G Te^) -4 1 as n -> oo, 
we have that H(Z n \J(w)) < n(H(Z\U) + S € ). 

Proof: The proof is given in Appendix A. ■ 

Information leakage analysis : For any W = w G [1 : K], the information leakage averaged over all 
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randomly chosen codebook C n can be bounded as follows. 

I(X n {w) \ Mi., M 2 , Z n \C n ) 

= I(X n (w);M 1 (w),M 2 (w), Z n \C n ) 

= H(X n (w )) - H(X n (w)\Mi(w),M 2 (w), Z n ,C n ) 

< nH(X) - H(X n (w)\J(w), Z n ,C n ) + H(M 2 (w)\C n ) 

< nH(X) - H(X n (w), Z n ) + H(J(w)\C n ) + H(Z n \J(w),C n ) + H{M 2 (w)\C n ) 

ip) 

< -nH(Z\X) + n(I(X ; 17) + <5 e ) + n(if(Z|E7) + <5 e ) + n(I(X; V\U, Y) + 3 8 e ) 

(c) 

< n(/(X; V, y) - /(X; F \U) + /(X; Z|C/) + <5'), 

where (a) follows since given the codebook, X n (w) — M 2 (w), Z n ) — (M^ w , M 2 ^) forms a 

Markov chain, ( b ) follows from the memoryless property of the sources, from the codebook generation 
where J(w) G [1 : 2 n ( I ( x ; u )+ s c)] anc j M 2 (w) G [1 : 2 n ( I ( x ' ,v \ u,Y ' >+3S€ ' > ], and from bounding the term 
H(Z n \J(w)) as in Lemma 1, and (c) follows from the Markov chain U — V — X — (Y, Z) for some 
8 f e > 5 8 e . The information leakage constraint is satisfied if 

L > I(X ; V, Y ) - I(X ; Y\U) + I(X ; Z\U). (13) 

Key rate analysis : For any W = w € [1 : iT], we consider the following bound on the secret key rate. 

tf(SM|C n ) > H(S(w)\J(w),M 2 (w),S\w),C n ) 

= H(S(w), J(w),M 2 (w), S\w)\C n ) - H(J(w),M 2 (w), S\w)\C n ) 

> H(U n , V n \C n ) - H{J{w)\C n ) - H(M 2 {w)\C n ) - H(S\w)\C n ) 

(b) 

> n(/(X; (7, y) - 25 e ) - n(/(X; C7) + <S £ ) - n(7(X; V|C7, y) + 35 e ) - n(J(y ; Z|(7) - 8 e ) 
>n(I(Y-V\U) - I(Z-V\U) - 8’ e ), 

where (a) follows since given the codebook, codewords (U n ,V n ) are functions of ( J(w),K(w )) = 

( J(w ), M 2 (w), S(w ), S'(w)) and (6) follows from the codebook generation where J(w) € [1 : 2 n ( / ( X;C/ )+ <5 ')] 
and M 2 (w ) € [1 : 2'' ! ' (/ ' l ^ X;V ’l fJ ’ y)+3 ' 5 7] and since probability of a specific pair (u n ,v n ) being selected in the 
enrollment can be bounded by p(u n ,v n ) < Y^ x n^ w -) e j-M^x\u n v n )P( xn ( w ^ — 2~ n ( I ( X ’ U ’ V ' ) - 2Se \ where 
the last inequality follows from properties of jointly typical sequences. Therefore, the key rate constraint 
is satisfied if 


Rs<UY-V\U)-I{Z-V\U). 


( 14 ) 
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Key leakage analysis : For any W = w G [1 : K], the key leakage averaged over all possible codebooks 
can be bounded as follows. 

I(S( W y,M 1 ,M 2 ,Z n \C n ) = I(S(w)-,Mi(w),M 2 (w),Z n \C n ) 

< H(S(w)\C n )-H(S(w)\J(w),M 2 (w),Z n ,Cn) 

= H(S(w)\C n ) - H(S(w), J(w),M 2 (w), Z n \C n ) + H(J(w), M 2 (w), Z n \C n ) 

< (S(m)|C n ) - (S(m), J{w),M 2 {w ), Z n , S»|C n ) + ff(S'(u;)|SH, M 2 (u;), Z n , C n ) 

+ H(J(w)\C n ) + H(M 2 (w)\C n ) + JH,C n ) 

( b ) 

< ^(5H|C n ) - F n , Z n |C n ) + ne n + i/(JH|C n ) + H{M 2 (w)\C n ) + i/(Z n | JH,C n ) 

(c) 

< ^(5(w)|C n ) - n(I(X; U, V ) + fT(Z|C/, F) - 25 e ) + ne n + n(I(X; U ) + S e ) 

(d) 

+ n(I(X; V\U, Y) + 3 8 e ) + n(H(Z\U) + 8 e ) < nS", (15) 

where (a) follows since given the codebook, S(w) — (Mi(w), M 2 (w), Z n ) — (M} W , M 2 W ) f° rms a Markov 
chain, ( b) follows since given the codebook, codewords {U n , V n ) are functions of (. J(w),K(w )) = 
(J (w), M 2 (w ), S(w), S'(w)), and from the Fano’s inequality H(S'(w)\S(w), J(w), M 2 (w), Z n ) < ne n 
which holds because from the codebook generation, the number of possible codewords V n for a given 
( J(w ), M 2 (w), S(w )) is less than ‘2 nI(V ' Z ^ IJ> and therefore with high probability V n (and thus S'(w )) can 
be decoded given (S(w),J(w),M 2 (w),U n ,Z n ), (c) follows from bounding the term H((J n , V n , Z n ) 
using properties of jointly typical sequences, i.e., 

p(u n ,v n ,z n ) < J2 p(x n (w),z n ) 

X n (w)eTe n \X\u n ,V n ,Z n ) 

< 2- n ( H ( X i Z )- H ( x \U,V,Z)-25 e ) _ 2-n(I(X\U,V)+H(Z\U,V)-26 e ) 

where the equality holds since Z — X — (U,V), from the codebook generation, and from Lemma 1, and 
finally (d) follows from the codebook generation where S(w) G [1 : 2 n ^ I ^ Y;V ^~ I ^ Z]V ^~ Se ^]. 

Given (15), combining (11) to (14) and invoking the random coding argument complete the achievability 
proof. 

Converse: We prove the converse for the average probability of error with respect to a user W randomly 
selected with uniform probability over the set [1 : K). Clearly, the average probability of error is less than 
or equal to the maximal probability of error over the users, such that the achievable region with respect 
to this less restrictive criterion contains the one with respect to the criterion given in our definition. The 
converse here implies that the two regions match and therefore completes the proof of the theorem. 
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Conditioned on W = w, the joint PMF of all relevant random variables is given by 


Px™(l),...,X n (K),M(l),...,M(K),S(l),...,S(K),Y"-,Z"\W=w 

K 


= P X "(w),M(w),S(w),Y’',Z n \W=w f[ 

Px”(j),M(j),S(j)\W=w 


K 

= Px"(w),M(w),S(w)\W=wPY’' ,Z"|X»(u>) 

f[ Px-{j),M{j),S(j)\W=w, 


where P X n {j) (x n ) = ]\ n i=1 P x {xi) for j e [1 : K], and P Y ^,z^\x^(w)(y n , z n \x n ) = n?=i Py,z\x(yu z i\ x i)- 
Let us define U t = (W, M(W),Y^ V &~ l ) and V t = (W,M(W), S(W),Y-^_ 1 , Z l ~ l ) which satisfy 
Ui — Vi — XiiW) — ( Yi , Zj) for alH = 1,... ,n. This can be seen as U, is included in V, and (Yi, Z,) 
is independent of 1/ given XiiW) due to the memoryless property of the “channel” Py,z\x- For any 
achievable tuple (. Ri , Rc, L, Rg) € we have Fano’s inequality H(W. S(W)\M. Y n ) < ne n . 

It then follows that 

n(Rj - S n ) < H(W) = H(W\M, Y n ) + I(W ; M, Y n ) 

(a) 

< ne n + I(W;M,Y n ) 

= ne n + I(W ; Y n \M) 

< ne n + H{Y n ) - H(Y n \W, M) 

= ne n + H(Y n ) - H(Y n \W, M(W )) (16) 

n 

< Y, H(Y) - H(Yi\W, M(W), Y? +1 , Z i_1 ) + ne n , 

2=1 

n 

2=1 

where (a) follows from Fano’s inequality H(W\M, Y n ) < H(W, S(W)\M,Y n ) < ne n , ( b ) follows 
from the fact that W is independent of M , (c) follows from the fact that conditioned on W = w, we 
have that Y n — M(w) — M^ w forms a Markov chain (see Appendix B (I) for the proof), and (d) follows 
from the definition of Ui. 
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Next, 

n(R c — Ri + S n ) > H(M(W )) - H{W) 

(a) 

> H(M{W )) - I{Y n ; W, M(W )) - ne n 

= -fT(W|M(W')) + H(W, M(W)\Y n ) - ne n 

> —H(W\M(W)) + H(W,M(W), S(W)\M^ w ,Y n ) — H(S(W)\M, W, Y n ) 

- H(M(W), S(W)\M\ w , Y n , X n (W), Z n ) — ne n 
= -H(W\M, X n (W), S(W)) + P 
( = } -H(W\M, X n (W), S(W),Y n , Z n ) + P 

where (a) follows from (16), ( b ) follows since W is independent of (M,X n (W),S(W)) and the 
definition P = H(W, M(W), S{W)\M\ w ,Y n ) - H(M(W), S(W)\M\ w ,Y n , X n (W), Z n ) - ne n - 
H(S(W)\M, W, Y n ), and (c) follows from the fact that we have I{W- Y n , Z n \M, X n {W ), 5(W)) = 0 
or equivalently H(Y n ,Z n \M,X n {W),S{W)) - H(Y n , Z n \M, X n (W), S{W), W) < 0 which holds 
since i) conditioned on W = w, ( Y n , Z n ) — X n {w ) — (M, S{w)) forms a Markov chain (see Appendix 
B (II)) and ii) we have the Markov chain W — X n (W) — ( Y n , Z n ) derived from the given “channel” 
Py,z\x- 

Continuing the chain of inequalities and substituting the value of P, we get 

n(R c - Ri + S n ) > H(W,M(W),S(W)\M\ w ,Y n ) - H(S(W)\M,W,Y n ) 

- H(W, M(W), S(W)\M\ w , Y n ,X n (W), Z n ) - ne n 

> I(W, M(W), S(W); X n (W), Z n \M\ w ,Y n ) - 2ne n 
(e) 

> H(X n (W),Z n \W,Y n ) - H(X n (W),Z n \W,M,S(W),Y n ) -2ne n 
= H(X n (W), Z n \Y n ) - H(X n (W), Z n \W, M, S(W), Y n ) - 2 ne n 

n 

> Z Zi\Yi ) - H(Xi(W), Zi\W, M(W), S(W),Y? +1 , Z^ 1 ^) - 2 ne n 

2=1 

(9) 

> W)-2ne n , 

2=1 

where (d) follows from Fano’s inequality where Ff(5(fF)|M, W, Y n ) < H(\Y S(W)\M.Y n ) < ne n , 
(e) follows from the fact that H(X n (W), Z n \M\ w ,Y n ) > H(X n (W), Z n \M\ w ,Y n ,W) and that 
conditioned on W = w, we have the Markov chain (X n (w), Z n ) — Y n — M' sw (see Appendix B (III)), (/) 
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follows from the facts that W is independent of X n {W ) and the Markov chain W — X n (W) — ( Y n , Z n \ 
and finally (g) follows from the definition of V % . 

The information leakage can be bounded as follows. 

1 K 

n(L + 8 n ) > max ^ I(X n (W);M,Z n ) > — ^ I(X n (w)-, M, Z n \W = w) 

W = 1 

= I(X n (W)-M,Z n \W) 

= I(X n (W)-W,M,Z n ) 

= I(X n (W)-, W, M, S(W), Y n ) - I(X n (W)-S(W)\W, M, Y n ) 

- I(X n (W)]Y n \W, M) + I(x n (wy, Z n \W , M) 

(b) _ 

> I(X n (W ); W, M, S(W ), y n ) - ne n - I(X n (W)-Y n \W, M ) + I(X n (W ); Z n |W, M) 

( C ) 

> /(X n (kk); kk, M(kk), S(kk), V n ) - ne n - I(X n {W ); V n |kk, M(kk)) + /(X n (kk); Z n |kk, M(kk)), 

where (a) follows from the fact that X n {W) is independent of W, ( b ) follows from Fano’s inequality, 
H(S(W)\W,M,Y n ) < H(W, S(W)\M, Y n ) < ne n , and (c) follows from the fact that conditioned on 
W = w, we have the Markov chain (X n (u;), Y n ^ Z n , S(w)) — M(w) — M^ w (see Appendix B (IV)). 
Continuing the chain of inequalities, we have 

n 

n(L + 8 n )>Y, H(Xi(W)) - H(Xi{W)\W, M(W ), S(W),X i ~ 1 (W),Y n ) - H(Y^\W., M(W),Y t n +l ) 

2=1 

+ H(Yi\W, M(W),Y& lt X n (W )) + (Zi|W, M{W), Z { ~ x ) - H(Zi\W, M(W), Z i ~\X n (W)) - ne n 

(d) n 

> (Xi(WO) - (Xi(W)|W, M(W), S(W),X i ~\W),Y n , Z 1 - 1 ) - I{Yi-Xi{W )) 

2=1 

+ 7(y i; W, M(W), + /(Z i5 X;(fy)) - I(Z i; w, M(W ), z'- 1 ) - ne n 

(e) n 

> J2 i(Xi(wy,w, m(w), s(w),yt, z i_1 ) - /(i- ; x*(wo) + i^xyw)) 

2 = 1 

+ 7(y; W, M(W), Z i ~ 1 , Y? +1 ) - I(Z r , W, M(W), Z^ 1 ,Y? +1 ) - ne n 

n 

= J (Xi(W); Vi,Yi) - KXnXiiWyUi) + I(Zi;Xi{W)\Ui) - ne n , 

2=1 

where (d) follows from the fact that conditioned on W = w, we have the Markov chains Xi(w) — 
(M(w), S(w),X i ~ 1 (w), Y n ) - Z 1 - 1 and (Y^Zi) - X^w) - (M(w), Z i ~ 1 , X n V(w)) which hold 

due to the memoryless properties of the “channel” PY n ,Z n \X*(w)(y n i zn \ xn ) — nT=l Py,Z\x(yh z i\ x i) 
and from the Markov chain ( Yi,Z {) — Xj(W) — II . (e) follows from the Csiszar’s sum identity [24] 
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which in this case is YJl= i H Y i'i M(W), Y x n +1 ) - I(Z. M(WQ, Z*" 1 ) = 0, and finally 

(/) follows from the definitions of Ui and Vi, and the Markov chain Ui — Xi{W) — (Y u Zi). 

Lastly, the secret key rate can be bounded as follows. 

1 K 

n(R s — 8 n ) < min H(S(W)) < — V H(S(w)\W = w) = H(S{W)\W) 

K 

W= 1 

= H(S(W)\W,M,Z n ) + I(S(W)-M,Z n \W) 

1 K 

= H(S(W)\W, M, Z n ) + ~Y_1Z n \W = w) 

W = 1 

< H(S(W)\W,M,Z n )+ mwc B) / (S(W)-,M,Z n ) 

(«) 

< H(S(W)\W,M,Z n ) + n5 n (17) 

(b) 

< H(S(W)\W, M, Z n ) — H(S(W)\W, M, Y n ) + nS n + ne n 
= I(S(W)\Y n \W,M) - I(S(W)\Z n \W,M) + nS n + ne n 

= I(S(W); Y n \W, M(W )) - I(S(W)- Z n \W, M(W )) + n8 n + ne n 

n 

= Z I(S(Wy,Yi\W, M{W),Y? +1 ) - I{S{W ); Z t \W, M(W), Z*" 1 ) + nS n + ne n 
2=1 
n 

= £ I{S{W);Yi\W, M(W), Y? +1 , Z* _1 ) - I(S(Wy ZilW, M(W),Y"+i, Z 1 - 1 ) + n8 n + ne n 
2=1 
n 

= £ J(V$; Filf/i) - J(V i5 Zi|t7i) + n5 n + ne n , (18) 

2=1 

where (a) follows from the key leakage constraint, ( b ) follows from Fano’s inequality, (c) follows from 
the fact that conditioned on W = w, we have the Markov chain (S(w), Y n ^ Z n ) — M(w ) — M^ w (cf. 
Appendix B (IV)), (d) follows from the Csiszar’s sum identity, and (e) follows from the definitions of 
Ui and Vi. 

The proof ends with the standard steps for single letterization using a time-sharing random variable 
and letting 5 n , e n 0 as n —>• oo. The cardinality bounds on the sets U and V can be proved using the 
support lemma [24], and is shown in Appendix C. ■ 

C. Binary Example 

To demonstrate the derived tradeoff in Theorem 1, we consider simple binary examples of the special 
cases in Remark 2 i) and ii) where the Markov chain X — Y — Z holds, i.e., X ~ Bernoulli (1/2), Y is an 
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erased version of X with erasure probability p, and Z is an erased version of Y with erasure probability 
9 - 

1) When there is no identification rate constraint, the region TZi^x-y-z in Remark 2 i) reduces to the 
set of all (. Rc , L, i?s) such that 

> p(l - /*(«))> 

^ > (1 - q){l-p) +p( 1 - /i(a)), 

< 9(1 -p)(l - h(a)), 

for some a E [0,1/2], where /i(-) is the binary entropy function. The proof is given in Appendix D 
where setting U = 0 in Remark 2 i) is optimal. We can see for example the tradeoff between the 
secret key rate and the leakage rate, i.e., to achieve a high secret key rate, we need to operate at a 
higher compression rate and also allow higher amount of information leakage. 

2) When there is no key rate constraint, the region 1Za,x-Y-z in Remark 2 ii) reduces to the set of 
all (Rj,Rc,L) such that 

Ri < (1 -P)( 1 - h(a)), 

Rc > Ri+p( 1 - h(a)), 

L> 1 - h(a)((l-p)q+p), 

for some a G [0,1/2]. The proof follows similarly as that of TZ^x-y-z an d is therefore omitted. 
We can see a similar tradeoff between the identification rate and the leakage rate, e.g., to achieve a 
high identification rate, we pay the cost of having high information leakage rate. 

III. Secure Identification/Authentication with a Privacy Constraint 

In this section we consider a new problem where the adversary is assumed to be active and tries to 
deceive the identification/authentication system using its own data. The main difference from the previous 
problem is that we impose a constraint on the false acceptance probability, replacing constraints on the 
secret key rate and key leakage. 


A. Problem Formulation 


Let us now consider a secure identification/authentication system as shown in Fig. 1 with an active 
adversary. Source, measurement, and side information alphabets, Tf, y, Z are assumed to be finite. The 


20 


users’ source sequences X n {w ) for w G = [1 : K] are independent across the users and have 

i.i.d. components distributed according to some fixed source distribution Px- Measurement sequence and 
side information ( Y n ,Z n ) are assumed to be outputs of the memoryless channel with given transition 
probability Py,z\x an d input X n {W ), where W is the index representing an arbitrary unknown user who 
presents itself to the system for identification/authentication. 

The enrollment and identification/authentication phases follow similarly as in Section II-A. In the event 
of an attack, the adversary presents to the decoder its own sequence y n G y n generated as a function 
of M and Z n , in order to gain access to the system. In this case, the adversary will first be identified 
as one of the users according to the decoding function g^\M,y n ). Its estimate of the key is equal to 
(M, y n ) which will then be compared with the original key of the user whom it is identified to be, e.g., 
S(g^\M,y n )). We define a false acceptance event to be an event that g^(M,y n ) = S(g^\M, y n )). 
Operationally, it means that the adversary gains access as if it were user g^\M,y n ). The maximum 
false acceptance probability (mFAP) is defined as max^^^nj^n P (g^(M,y n ) = S(g^\M, y n ))) . 2 
Since the adversary will be identified as one of the users in the database, we are concerned about whether 
it will also be positively authenticated and therefore wish to minimize the maximum false acceptance 
probability exponentially. 

As before, information leakage rate of user W at the adversary who has access to the stored database 
M and side information Z n is given by the mutual information rate I(X n (W ); M, Z n )/n. 

We are interested in characterizing the optimal tradeoff between the identification rate, compression 
rate, information leakage rate, and mFAP exponent. 

Definition 3: An |W^|,n) -code for secure identification and authentication with a privacy 

constraint consists of 

• A set of stochastic encoders : w G such that the w -th encoder takes X n (w) as input and 

generates G x according to a conditional PMF p{m{w), s(w)\x n (w)). 

• A decoder : M^ K x y n —» W^ n \ such that the identified user is W = g^ (M,Y n ). 

• A decoder g^ : M^ K x y n —» S^ n \ such that the estimated secret key is S = g^(M,Y n ). 

0 

Definition 4: An identification-compression-leakage-mFAP exponent tuple (Rj : Rc, L, E) G is 
said to be achievable if, for any 5 > 0, there exists a sequence of |W^|,n)-codes such that 


2 We note that the maximization here is over the functions y n (•), not over the sequences in y n . 
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for all sufficiently large n, 

max P ((W, S) ^ (W, S(W ))) < 5, (19) 

wev v( n ) 

— log |W (n) | > Ri — 5, (20) 

n 1 1 

-loglX^I < R c + S, (21) 

n 1 1 

max -I(X n (W)-M, Z n ) < L + <5, (22) 

iveww n 

max P (g^(M,y n ) = S(g^ (M,y n ))) < (23) 

The identification-compression-leakage-mFAP exponent region 7^2 is defined as the closure of all achiev¬ 
able tuples. <0 


B. Result 

Theorem 2: The region 7^2 for the secure identification/authentication problem defined above is given 


by a set of all tuples ( Rj , Rc, T, E) G such that 

Ri<I(Y-U ), (24) 

+ (25) 

L > /(X; y, Y) - /(X; y |C/) + /(X; Z|C/), (26) 

£;</(y;y|t/)-/(y;^|t/), ( 27 ) 

for some P x ^zP V \xPu\v with \U\ < |X| + 4, |V| < (|X| + 4)(|X| + 2). □ 


Remark 3: The regions TZi and 7^2 specified in Theorems 1 and 2 have the same form. In particular, 
the maximum mFAP exponent in Theorem 2 is equivalent to the maximum achievable secret key rate 
in Theorem 1. This reveals a connection between the achievable secret key rate and the security of 
identification/authentication system in terms of false acceptance probability. 

Intuitively, the equivalence follows from the fact that the coding scheme used to prove Theorem 2 also 
achieves negligible key leakage rate for each user, implying that the adversary has no useful knowledge 
about the secret key. It can then only guess the secret key S from possible values in a set whose cardinality 
is at least 2 H ^ S \ Therefore, the false acceptance probability is upper-bounded by 2~ H ^ which is further 
bounded by 2~ n ^ Rs ~^ when translating to the problem with the secret key rate constraint. The same 
observation holds true when specializing to the single user case [12]. This is also noted in [1C] for the 
case without adversary’s side information. 
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Proof of Theorem 2: The proof of identification rate, compression rate, and information leakage 
rate remain the same as in the previous problem in Section II. We will only provide the proof of the 
mFAP exponent of which the main idea follows similarly as that in [10], [12]. 

Achievability: We use the same achievable scheme as in the proof of Theorem 1. For an achievable 
mFAP exponent, we consider the adversary who knows fh = {ffii^fa^) and side information z n . Let 
gid(-) and ^au(-) denote the decoding functions for identification and the secret key estimation in the 
achievable scheme. The adversary tries to select a sequence y n (fh,z n ) that results in the estimated key 
gAu{rh,y n ) equal to the original key of the user it is identified to be, i.e., S(gid(fh, y n )). 

In the achievable scheme, the secret key is chosen as the subbin index of the selected codeword v n . 
Thus, the adversary only needs to consider the secret key that results from codewords v n which are 
jointly typical with x n . There are in total 2 n ( / ( X;t W +25e ) such codewords generated. 

From the binning scheme with uniform bin and subbin index assignment, we have that the joint 
probability that a description m of certain user w G {1,..., K} is selected and a certain secret key of 
that user s(w) is chosen is equal to a total number of jointly typical codewords v n with corresponding 
indices m(w) = (mi(wi), rri 2 (w)) and s(w) divided by a total number of jointly typical codewords v n . 
That is, 


F(M(w) m, S(w) = s) < 


F(M(w)=m)-2^^ x ^ v ^+ 25 ^ 

|S| 

2n(I(X-U,V)+25 e ) 


(28) 


Then 


mFAP = max ¥(g Au (M, y n (M, Z n )) = S(gu ( M , y n (M, Z n )))) 

= max V'V'P(M = m,Z n = z n ,g Au {fh,y n {m,z n )) = S{g u (fh,y n (fh, z 11 )))) 

= max VVP(M = fh,S(g u (fh,y n )) = g Au (fh,y n )) -P(Z n = z n \M = fh, S(g u (fh, y n )) = g Aa (m,y n )) 

rh z" 

= max n y] y: = m\ 9w ( m ^”)) • P(M(g u (fh,y n )) = y n )), S(g u (fh, y n )) = g Au (fh,y n ))- 

y n ^ e y n fh z n 

P(Z n = z n \M = fh, S(gid(fh, y n )) = g Au (fh,y n ))j 

^ vJ_j£!_ 

— 2 n ( I ( X ’ U ’ V )+ 2S ‘ ) 

m 

^ /P(M(g u (fh,y n )) = m) ■ 2»( / (- x ':*W+ M «) \ 1 

- L \s\ h 1 ) ‘ 2^inx-,U,V)+2S.) 

m ' ' 

= 2~ n ( I ( V ’ Y \U)-I(V-,Z\U)-S € ) _|_ 2~n(I(V;Y)-R I -3S € ) 2~ n ( I ( v ; Y \U)-I(V;Z\U)-5 / c ) 
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where (a) follows since M \0M( m >f/ n ) is independent of (M(gi&(fh, y n )), S(gid(fh, y n ))), (b) follows from 
the uniform bin and subbin index assignment in the achievable scheme and from the bound in (28), 
(c) follows from the code construction where \S\ = 2 n ^ I ^ v ‘Z\ u )- I ( v ZP)-S e ) and \M\ = |^Wi||A^21 — 
2n(i(x-y\Y)+Ri+55 e ), anc j ^ follows from the constraint Rj < I{U\Y) derived for the identification rate 
which together with the Markov chain U — V — Y makes I(V;Y ) — Rj > I(V;Y\U) — I(V; Z\U ). 

That is, we have 

l log ^ > I(V ; Y\U) - I(V ; Z\U) -6' e >E-5' e , 
if E< I(V;Y\U) — I(V;Z\U). 

Converse: We provide a converse proof for the mFAP exponent. Set i : ; = (\\\ M(\\ ).)]'[ Z' 1 ) 
and Vi = ('W, M(W), S(W), Yfa, Z l ~ l ) which satisfy U t - V - X t (W) - (Yi, Z<) for all z = 1,..., n. 

Let us define the set of secret key messages that can be reconstructed from to, i.e., (’(///. w) = {A w ) : 
there exists a y n € y n s.t. (fh,y n ) = s(w)}. Also, let C(-) be a function of s(w) and to, where 
C(s(w),fh ) = 1 for s(w) € C(fh,w), and 0 otherwise. We have that 

6 n >n(S,W)^(S(W),W)) 

>P(5 ± S(W),W = W) 

> (l-S n )F(S^S(W)\W = W), 


where the last inequality follows from P(1L = W) > 1 — S n . Thus, S' n = > F(S ^ S(W)\W 

W) > p(AT = m,W = w, S(W) £ C(m, w)) = P(C = 0). 

Now consider the following bound. 

mFAP = max y n ) = S(g^(M, y n ))) 

v™(M,z™)ey n 

= max V P(M = rh,Z n = z n ,W = w,g^(fh,y n ) = S(g^\m,y n ))) 

yn rn^w 


(a) 


> P(M = to, Z n = z n , W = w, S(w ) = s(m, z n , w )) 


m,z n ,w 


(*) 


) p(m, z n ,w) max p(s(w)\m, z n , w) 

_— J s(w)€LC(fh,w) 

m,z n ,w 


> p(m,z n ,w) max p(s(w), C = l|m, z n , w) 

s(w)eC(fh,w) 
m,z n ,w v ' v 7 

> y p(m, z n ,w)p{C = l|m, z n ,w) • max p(s(u;)|m, w, C = 1), (29) 

- „ s(»eC(m,K;) 
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where in (a), the adversary who knows m and z n may choose y n that results in g^\m,y n ) = w , and 
the corresponding MAP estimate of s(u;), i.e., 

y n ) — s(m,z n ,w) — arg max p(s(w)\fh, z n , w), (30) 

s(w)£C(fh,w) 

and (6) follows from (30). 

Then for any achievable E , it follows that 

(а) 

< — log (P(C = 1)) — log 

fh 1 z n 1 w 

(б) 

<— log(l — 5' n ) — 2_] P^rri, z n ,w\C = 1)-log ( max p(s(w)\fh, z n , w, C = 1)) 

_— J v s(w)£C(m,w) 

m,z n ,w 

<-log(l-^)- {p(rh,z n ,w\C = 1)- 

fh,z n ,w 

p(s(u;)|m, u;, C = 1) • log(p(s(u;)|m, w^C — 1))^ 

s(w)£C(fh,w) 

= - log(l - 5 ;) + H(S(W)\M, Z n , W,C = 1), 

where (a) follows from (29) and ( b ) follows from P {C — 1) >l~Sn and Jensen’s inequality [25]. 
Continuing the chain of inequalities with the fact that 

(1 - 5'jH(S(W)\M,Z n ,W,C = 1) < P (C = l)H(S(W)\M,Z n ,W,C = 1) < H(S(W)\M, Z n , W), 

we get 

(1 - 5;) • [n(E - S n ) + log(l - ^)] < H(S(W)\M, Z n , W) 

(a) " 

< YWMM-HVi^im + nen, 

2=1 

where (a) follows from the steps (17) to (18). 

The proof ends with the standard steps for single letterization using a time-sharing random variable 
and letting 5 n , e n —>► 0 as n —>> 00 . ■ 


( P(m, z n , w\C = 1) • max p(s(w)\m, z n , w, C = 1)) 


IV. Conclusion 

We studied two related problems of secret key-based identification and authentication under a privacy 
constraint on the enrolled source data. An adversary is assumed to have access to the stored database of 
helping data and the “online” side information correlated with the user’s data. First, we considered the 
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case where the adversary is passive and characterized the optimal tradeoff region of the identification 
rate, compression rate, leakage rate, and secret key rate for discrete memoryless sources. Then we 
considered a variant of the problem where the adversary is active in the sense that it tries to deceive the 
identification/authentication using its own data. In this problem, we characterized the optimal tradeoff 
between the identification rate, compression rate, leakage rate, and mFAP exponent. Both results are 
derived based on the same achievability scheme involving layered random binning and rate allocation 
technique applied on the first layered codeword. They shed light on whether one should aim to design the 
secret key-based identification/authentication system to achieve the highest secret key rate as the secret 
key here is not for encryption but only for authentication purpose. It turned out that the maximum secret 
key rate in the first problem is equivalent to the maximum achievable mFAP exponent in the second one, 
revealing a close connection between security of identification/authentication system and the maximum 
achievable secret key rate. 


Appendix A 
Proof of Lemma 1 

Let T be a binary random variable taking value 0 if {U n (J (w )), Z n ) G Te U \ and 1 otherwise. Since 
(X n (w),U n (J(w)),V n (J(w),K(w)),Y n ,Z n ) € T e (n) with high probability, we have P(T = 1) < S e . 

It follows that 

H{Z n \J{w)) < H(Z n ,T\J(w),U n ) 

< H(Z n \U n ,T) + H(T) 

= P(T = 0)H(Z n \U n , T = 0) + P(T = l)H{Z n \U n , T = 1) + H(T) 

(■ b ) 

< H(Z n \U n ,T = 0) + 8 e H(Z n ) + h(8 e ) 

< H{Z n \U n ,T = 0) + n8 e log\Z\ + h(S e ) 

= J2 p(u n \T = 0)H(Z n \U n = u n ,T = 0) + nS e log\Z\ + h(S e ) 

< J] p(rt n |T = 0)log \T^ n \Z\u n ) \ + n5 e \og\Z\ + h(S e ) < n(H(Z\U) + S' e ), 

^er e (n) 

where (a) follows from the fact that given the codebook, U n is a function of J(w), ( b ) follows from 
P(T = 1) < S e where h(-) is the binary entropy function, and (c) follows from the property of jointly 
typical set [21] with 5 e , S' e —>► 0 as e —>► 0, and e A 0 as n A oo. 
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Appendix B 

Proof of Markov Chains in Converse of Theorem 1 

We prove the Markov chains used in the converse proof of Theorem 1 based on the fact that, conditioned 
on W = w, the joint PMF of (X n (w),M, S(w),Y n , Z n ) is given by 

¥(X n (w) = x 11 , M = m, S(w ) = s,Y n = y n , Z n = z n \W = w) 

= P(X n (y;) = x n , M(w) = m w , S(w ) = s\W = w)F(M\ w = mS w \W = w)- 

p(yn = y n ? yn = = ^). 

(I) Y n - M(w) - M\ w 

Proof: Conditioned on W = w, we write the joint PMF of ( Y n ,M ) as 
F(Y n = y n , M = fh\W = w) 

= J]P(X n (y;) = x n ,M(w ) = m w \W = w)¥(M\ w = fh^ w \W = ru)P(Y n = y n \X n {w) = x n ) 

X n 

= F{M{w) = m w ,Y n = y n \W = w)F(M\ w = fh> w \W = w) 

which implies that Y n — M(w) — M' :W forms a Markov chain. ■ 

(II) (AT, S(w)) - X n (w ) - (Y n , Z n ) 

Proof: Conditioned on W = w, we write the joint PMF of (X n (w), M, S(w),Y n , Z n ) as 
F{X n (w) = x n ,M = fh, S(w) = s, Y n = y n , Z n = z n \W = w) 

= P(X n H = x n , M(w) = m w ,S(w) = s\W = w)F(M\ w = fhf w \W = w)- 
P (Y n = y n , Z n = z n \X n (w) = x n ) 

= P(X n (ru) = x n , M = m, S(w) = s\W = w)F(Y n = y n , Z n = z n \X n (w) = x n ) 

which implies the Markov chain (M, S(w)) — X n (w) — ( Y n , Z n ). ■ 

(III) (X n (w), Z n ) -Y n - M> 

Proof: Conditioned on W = w, we write the joint PMF of (X n (w). M'' w , Y n , Z n ) as 

P (X n {w) = x 11 , M\ w = fh\ w , Y n = y n , Z n = z n \W = w) 

= P(X n (ru) = x n ,Y n = y n ,Z n = z n \W = w)F(m\ w = mS w \W = w ) 


which implies the Markov chain (X n (w), Z n ) — Y n — M' xW . 
(IV) (X n {w), Y n , Z n , S{w )) - M{w) - M> 
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Proof: Conditioned on W = w, we write the joint PMF of ( X n (w ), M, S(w),Y n , Z n ) as 

P (X n (w) =x n ,M = fh , SH = 5, = y n , = z n |VF = w) 

= P(AT n (m) = ® n , M(w) = m w , S(w) = s\W = w)F(M\ w = fh\ w \W = w)- 
P (Y n = y n , Z n = z n \X n (w ) = x n ) 

= P(X n (w) = x n , M(w) = m w , S(w) = s,Y n = y n , Z n = z n \W = w)P(M\ w = fh\ w \W = w ) 
which implies the Markov chain (X n (w),Y n , Z n , S(w )) — M(w) — M'' - "’. ■ 

Appendix C 

Cardinality Bounds of The Sets U and V in Theorem 2 
Consider the expression of R \ in Theorem 1: 

Rj < I(Y ; U) 

R C > Ri + I{X',V\Y), 

L > I(X■ V, Y) - I(X; Y\U) + I(X; Z\U), 
Rs<I{V;Y\U)-I(V]Z\U), 

for some U &U, V G V such that U — V — X — (Y, Z) forms a Markov chain. 

We can rewrite some mutual information terms in the expression above as 

< H(Y)-H(Y\U) 

R c >Ri+ H{X\Y) - H(X, Y\V) + H(Y\V), 

L > H(X) - H(X, Y\V) + H(Y\V) - H(Y\U ) + H{Y\X) + H(Z\U) - H(Z\X), 

Rs < H(Y\U) - H(Y\V) - H(Z\U) + H(Z\V). 

We will show that the random variables U and V may be replaced by new ones, satisfying \U\ < 
\X\ +4, |V| < (\X\ + 4)(|A’| + 2), and preserving the terms H(X,Y\V), H(Y\V), H(Z\V), H(Y\U), 
and H(Z\U). 

First, we bound the cardinality of the set U. Let us define the following <Y ■ 1 continuous functions 
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of p(v\u), V E V, 

fj(p(v\u)) = ^2p(v\u)p(x\u,v), j = 1 ,..., \X\ - 1 , 

^ev 

/ w (^N) = ^(X,y|F,t/ = u) 

= y, y |u = u) - h ( v\u = «), 

/ w+ 1 (p(^|u)) = ^(y|y,t/ = u) 

= fr(y, y|c/ = u) - #(v |t/ = u), 

/| Ar i +2 (p(t;| u )) = fr(z|v;t7 = u ) 

= H(Z, V\U = u)~ H(V\U = u), 
f lxl+3 (p(v\u)) = H(Y\U = u), 
f\x\+ 4 (p(v\u)) = H(Z\U = u). 

The corresponding averages are 

^2p(u)fj(p(v\u)) = Px(x), j = 1 ,..., \X\ - 1 , 

ueu 

Y J P( u )f\xMv\u)) = H{X,Y,V\U)-H(V\U), 

ueu 

^p(u)f lxl+1 (p(v\u)) = H(Y, V\U) - H(V\U), 

u£U 

EpM/|^|+2(p(H«)) = H(Z, V\U) - H(V\U), 

u£U 

^P(u)f\x\ +3 {p(v\u)) = H(Y\U), 
ueu 

^2p(u)f\x\+4(p(v\u)) = H(Z\U). 

u£U 

According to the support lemma [24], we can deduce that there exists a new random variable U' jointly 
distributed with (X, Y, Z, V) whose alphabet size is \U'\ — | X|+4, and numbers oli > 0 with ^i=i +4 a i = 
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1 that satisfy 

1 ^ 1+4 

X a ifj( p v\U'(. v \i)) = Px{x), j = l,...,\X\-l, 

2=1 

1 ^ 1+4 

X a if\x\( p v\U'(v\i)) = H(X,Y,V\U')-H(V\U'), 

2=1 

1 ^ 1+4 

X <*if\x\+i(Pw'{v\i)) = H{Y,V\U')-H(V\U’), 

2=1 

1 ^ 1+4 

X <*if\x\+2(Pv\U'{v\i)) = H(Z,V\U')-H(V\U'), 

2=1 

1 ^ 1+4 

X »if\ x \ +3 (P v \u'(v\i)) = H(Y\U f ), 

2=1 

W+4 

X <x i fw + t(P m ,{v\i)) = H(Z\U'). 

2=1 

Note that we have 

H{X.Y.V\U') - H{V\U') 

= H(X,Y,V\U)-H(V\U ) 

= 2f(x,F|y), 

where (a) follows from the Markov chain U — V — X — (Y,Z). Similarly, from the Markov chain 
U -V - X - (Y,Z), we have that H(Y, V\U') - H(V\U') = H(Y, V\U) - H(V\U) = H(Y\V), and 
H(Z, V\U')-H(y\U') = H(Z,V\U) — H(V\U) = H{Z\V). Since P x (x) is preserved, P x ,y,z{x, y, z) 
is also preserved. Thus, H(X\Y),H(Y\X), H(Z\X) are preserved. 

Next we bound the cardinality of the set V. For each v! € U', we define the following \X — 2 
continuous functions of p(x\u',v), x <E X, 

fj(p(x\u',v )) = p(x\u',v), j = 1,..., \X\ - 1, 

f\x\ (: P(x\uv)) = H(X, Y\U' = u’,V = v), 

f\x\+Mx\u',v)) = H(Y\U' = u',V = v), 

f\x\+ 2 (p(x\u', v)) = H(Z\U' = u',V = v). 

Similarly to the previous part in bounding \U\, there exists a new random variable V'\{U' = u'} ~ p(v'\u') 
such that | V'| = \X\ + 2 and p{x\u'), H(X, Y\U' = u', V ), H(Y\U' = u', V ), and H(Z\U' = u', V ) are 
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preserved. 

By setting V" = {V f , [/') where V n = V' x ZY, we have that U' — V" — X — (Y, Z) forms a Markov 
chain. 

Furthermore, we have the following preservations by V", 

H(X,Y\V") 

= H(X,Y\V',U') 

( = } H{X,Y\V,U') 

= H(X, Y\V, U) 

-//(-V.V|V), 

where (a) follows from preservation by V 7 , (6) follows from preservation by U\ and (c) follows from 
the Markov chain U — V — X — (Y, Z). Similarly, from preservation by U' and V', and the Markov chain 
U — V—X — (Y, Z), we have that f/'(Y|Y") = H(Y\V',U') = H(Y\V) and H(Z\V") = H(Z\V',U') = 
H(Z\V). 

Therefore, we have shown that U eW and \ C V may be replaced by U’ € U' and l r/ € V ,/ satisfying 

\U'\ = \X\+±, 

\V"\ = |W / ||V , | = (\X\ + 4)(|Af| + 2), 

and preserving the terms Y\V), H(Y\V), H(Z\V), H(Y\U), and H(Z\U). 

Appendix D 

Proof of the Compression-leakage-key rate Region in the Binary Example 

Achievability: Let V be an output of a BSC(o) with input X, where a € [0,1/2]. Then by setting 
U = 0, it follows from the expression in Remark 2 i) that 

R C >I(X-V\Y) 

®p.(H(X)-H(X\V)) 

- V (1 - h(a)), 
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where (a) follows since Y = e with probability p, otherwise Y — X, and (6) follows from the choice 
of V, 

L>I(X-Z) + I(X-V\Y) 

l-H(X\Z)+p-(l-h(a)) 

= 1 - ((1 - p)q + p) + P ■ (1 - h(a)) 

= (1 - ?)(1 ~P) +P- (1 - M a ))> 

where (a) follows from the bound on Rc and (6) follows since Z = e with probability (1 — p)q + p, 
otherwise Z — X. 

R S <I(Y-V\Z) 

= I(X;V\Z) -I(X;V\Y) 

- ((1 -p)q + p) ■ I(X;V) -p - (1 - h(a )) 

= 9(1 -P)( 1 - 

where (a) follows from the Markov chain V — X — Y — Z and ( b ) follows since Z — e with probability 
(1 — p)q + p, otherwise Z — X. 

Converse: Let (Rc, L, R$) be an achievable tuple. We now prove that there exist a G [0,1/2] satisfying 
the inequalities shown in the achievability above. From the region specified in Remark 2 i), we have the 
following bound on the compression rate Rc- 

R C >I(X;V\Y) 

= P-I(X-,V) 

= p.(l-H(X\V)). 

Since 0 < H(X\V) < H(X) = 1, and h(-) is a continuous one-to-one mapping from [0,1/2] to [0,1], 
there exists a G [0,1/2] s.t. H(X\V) = h(a), and thus Rc > p • (1 — h(a)). The bounds on L and Rs 
readily follow from H(X\V) = h(a). 
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